Goto

Collaborating Authors

 Thimphu


Mechanistic Interpretability with SAEs: Probing Religion, Violence, and Geography in Large Language Models

Simbeck, Katharina, Mahran, Mariam

arXiv.org Artificial Intelligence

Despite growing research on bias in large language models (LLMs), most work has focused on gender and race, with little attention to religious identity. This paper explores how religion is internally represented in LLMs and how it intersects with concepts of violence and geography. Using mechanistic interpretability and Sparse Autoencoders (SAEs) via the Neuronpedia API, we analyze latent feature activations across five models. We measure overlap between religion- and violence-related prompts and probe semantic patterns in activation contexts. While all five religions show comparable internal cohesion, Islam is more frequently linked to features associated with violent language. In contrast, geographic associations largely reflect real-world religious demographics, revealing how models embed both factual distributions and cultural stereotypes. These findings highlight the value of structural analysis in auditing not just outputs but also internal representations that shape model behavior.


Neural Combinatorial Optimization for Real-World Routing

Son, Jiwoo, Zhao, Zhikai, Berto, Federico, Hua, Chuanbo, Kwon, Changhyun, Park, Jinkyoo

arXiv.org Artificial Intelligence

Vehicle Routing Problems (VRPs) are a class of NP-hard problems ubiquitous in several real-world logistics scenarios that pose significant challenges for optimization. Neural Combinatorial Optimization (NCO) has emerged as a promising alternative to classical approaches, as it can learn fast heuristics to solve VRPs. However, most research works in NCO for VRPs focus on simplified settings, which do not account for asymmetric distances and travel durations that cannot be derived by simple Euclidean distances and unrealistic data distributions, hindering real-world deployment. This work introduces RRNCO (Real Routing NCO) to bridge the gap of NCO between synthetic and real-world VRPs in the critical aspects of both data and modeling. First, we introduce a new, openly available dataset with real-world data containing a diverse dataset of locations, distances, and duration matrices from 100 cities, considering realistic settings with actual routing distances and durations obtained from Open Source Routing Machine (OSRM). Second, we propose a novel approach that efficiently processes both node and edge features through contextual gating, enabling the construction of more informed node embedding, and we finally incorporate an Adaptation Attention Free Module (AAFM) with neural adaptive bias mechanisms that effectively integrates not only distance matrices but also angular relationships between nodes, allowing our model to capture rich structural information. RRNCO achieves state-of-the-art results in real-world VRPs among NCO methods. We make our dataset and code publicly available at https://github.com/ai4co/real-routing-nco.


LM-PUB-QUIZ: A Comprehensive Framework for Zero-Shot Evaluation of Relational Knowledge in Language Models

Ploner, Max, Wiland, Jacek, Pohl, Sebastian, Akbik, Alan

arXiv.org Artificial Intelligence

Knowledge probing evaluates the extent to which a language model (LM) has acquired relational knowledge during its pre-training phase. It provides a cost-effective means of comparing LMs of different sizes and training setups and is useful for monitoring knowledge gained or lost during continual learning (CL). In prior work, we presented an improved knowledge probe called BEAR (Wiland et al., 2024), which enables the comparison of LMs trained with different pre-training objectives (causal and masked LMs) and addresses issues of skewed distributions in previous probes to deliver a more unbiased reading of LM knowledge. With this paper, we present LM-PUB- QUIZ, a Python framework and leaderboard built around the BEAR probing mechanism that enables researchers and practitioners to apply it in their work. It provides options for standalone evaluation and direct integration into the widely-used training pipeline of the Hugging Face TRANSFORMERS library. Further, it provides a fine-grained analysis of different knowledge types to assist users in better understanding the knowledge in each evaluated LM. We publicly release LM-PUB-QUIZ as an open-source project.


BEAR: A Unified Framework for Evaluating Relational Knowledge in Causal and Masked Language Models

Wiland, Jacek, Ploner, Max, Akbik, Alan

arXiv.org Artificial Intelligence

Knowledge probing assesses to which degree a language model (LM) has successfully learned relational knowledge during pre-training. Probing is an inexpensive way to compare LMs of different sizes and training configurations. However, previous approaches rely on the objective function used in pre-training LMs and are thus applicable only to masked or causal LMs. As a result, comparing different types of LMs becomes impossible. To address this, we propose an approach that uses an LM's inherent ability to estimate the log-likelihood of any given textual statement. We carefully design an evaluation dataset of 7,731 instances (40,916 in a larger variant) from which we produce alternative statements for each relational fact, one of which is correct. We then evaluate whether an LM correctly assigns the highest log-likelihood to the correct statement. Our experimental evaluation of 22 common LMs shows that our proposed framework, BEAR, can effectively probe for knowledge across different LM types. We release the BEAR datasets and an open-source framework that implements the probing approach to the research community to facilitate the evaluation and development of LMs.


Space Invaders at 40: What the game says about the 1970s – and today

The Independent - Tech

The Space Invaders arcade video game, celebrating its 40th anniversary, is a classic piece of software credited as one of the earliest digital shooting games. As a game designer and teacher of games, I know how meaning is carried from designer to the mechanics of play. As a game studies researcher, I also know how games reveal myth, meaning and culture. An analysis of Pac-Man, for instance, shows how that game embodies many values of its day – including consumerism, drug use and gender politics. The message in Space Invaders is as basic as the graphics: when faced with conflict, players have no option except to blast it away.


Apple HomePod is already losing the smart speaker battle

The Independent - Tech

The war for your digital home is waging. Apple has finally followed Amazon, Google and Microsoft by launching a smart speaker with a voice-controlled artificial intelligence assistant. Yet even though the "HomePod" is another technological marvel, there's a chance Apple is already losing the battle. The competition isn't just through the sound quality of the speaker – but the other things that users can do with it. The most common requests to AI personal assistants such as Apple's Siri are reportedly to play music, read the weather forecast and set timers or reminders.